source sentence
- Asia > China > Beijing > Beijing (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Netherlands (0.04)
- Asia > Uzbekistan (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
- North America > United States (0.28)
- North America > Canada (0.04)
- Europe > Germany > Berlin (0.04)
- Asia > China > Beijing > Beijing (0.04)
Policy-based Sentence Simplification: Replacing Parallel Corpora with LLM-as-a-Judge
Wu, Xuanxin, Arase, Yuki, Nagata, Masaaki
Sentence simplification aims to modify a sentence to make it easier to read and understand while preserving the meaning. Different applications require distinct simplification policies, such as replacing only complex words at the lexical level or rewriting the entire sentence while trading off details for simplicity. However, achieving such policy-driven control remains an open challenge. In this work, we introduce a simple yet powerful approach that leverages Large Language Model-as-a-Judge (LLM-as-a-Judge) to automatically construct policy-aligned training data, completely removing the need for costly human annotation or parallel corpora. Our method enables building simplification systems that adapt to diverse simplification policies. Sentence simplification could benefit users with reading difficulties, such as second-language (L2) learners and people with reading impairments (e.g., dyslexic individuals), by making text easier to read and understand (Alva-Manchego et al., 2020b). It involves a series of edits, such as lexical paraphrasing, sentence splitting, and removing irrelevant details (Xu et al., 2015). The preferred edit policy, i.e., permissible or appropriate edits in given texts, varies significantly depending on the target audience. In L2 education, one of the major application areas for simplification, previous work in both NLP and language education research has shown that the desired type and degree of simplification edits change depending on learner proficiency and readability levels (Agrawal et al., 2021; Zhong et al., 2020). Specifically, low-to intermediate-level learners benefit from a combination of lexical paraphrasing, structural modifications, and selective deletions to reduce cognitive load. In contrast, advanced learners benefit from lexical paraphrasing, which supports vocabulary acquisition (Chen, 2019), but they gain comparatively less from added cohesion or deletion (Hosoda, 2016; Zhong et al., 2020). Motivated by these findings, we introduce two distinct edit policies. As illustrated in Table 1, overall-rewriting simplification often combines lexical paraphrasing, structural modifications, and deletions to improve readability for intermediate-level language learners. In contrast, lexical-paraphrasing (Paetzold & Specia, 2016; Li et al., 2025) adheres to the original sentence closely while supporting more efficient vocabulary acquisition for advanced learners.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- South America > Brazil (0.04)
- Europe > Russia (0.04)
- (17 more...)
JELV: A Judge of Edit-Level Validity for Evaluation and Automated Reference Expansion in Grammatical Error Correction
Zhan, Yuhao, Zhang, Yuqing, Yuan, Jing, Ma, Qixiang, Yang, Zhiqi, Gu, Yu, Liu, Zemin, Wu, Fei
Existing Grammatical Error Correction (GEC) systems suffer from limited reference diversity, leading to underestimated evaluation and restricted model generalization. To address this issue, we introduce the Judge of Edit-Level Validity (JELV), an automated framework to validate correction edits from grammaticality, faithfulness, and fluency. Using our proposed human-annotated Pair-wise Edit-level Validity Dataset (PEVData) as benchmark, JELV offers two implementations: a multi-turn LLM-as-Judges pipeline achieving 90% agreement with human annotators, and a distilled DeBERTa classifier with 85% precision on valid edits. We then apply JELV to reclassify misjudged false positives in evaluation and derive a comprehensive evaluation metric by integrating false positive decoupling and fluency scoring, resulting in state-of-the-art correlation with human judgments. We also apply JELV to filter LLM-generated correction candidates, expanding the BEA19's single-reference dataset containing 38,692 source sentences. Retraining top GEC systems on this expanded dataset yields measurable performance gains. JELV provides a scalable solution for enhancing reference diversity and strengthening both evaluation and model generalization.
- Europe > Austria > Vienna (0.14)
- North America > Canada (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (15 more...)
- Research Report (0.82)
- Overview (0.68)
Dual-branch Prompting for Multimodal Machine Translation
Wang, Jie, Yang, Zhendong, Zong, Liansong, Zhang, Xiaobo, Wang, Dexian, Zhang, Ji
Multimodal Machine Translation (MMT) typically enhances text-only translation by incorporating aligned visual features. Despite the remarkable progress, state-of-the-art MMT approaches often rely on paired image-text inputs at inference and are sensitive to irrelevant visual noise, which limits their robustness and practical applicability. To address these issues, we propose D2P-MMT, a diffusion-based dual-branch prompting framework for robust vision-guided translation. Specifically, D2P-MMT requires only the source text and a reconstructed image generated by a pre-trained diffusion model, which naturally filters out distracting visual details while preserving semantic cues. During training, the model jointly learns from both authentic and reconstructed images using a dual-branch prompting strategy, encouraging rich cross-modal interactions. To bridge the modality gap and mitigate training-inference discrepancies, we introduce a distributional alignment loss that enforces consistency between the output distributions of the two branches. Extensive experiments on the Multi30K dataset demonstrate that D2P-MMT achieves superior translation performance compared to existing state-of-the-art approaches.
- Asia > China > Sichuan Province > Chengdu (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (5 more...)
ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models
Existing benchmarks for large language models (LLMs) are largely restricted to high- or mid-resource languages, and often evaluate performance on higher-order tasks in reasoning and generation. However, plenty of evidence points to the fact that LLMs lack basic linguistic competence in the vast majority of the world's 3800+ written languages. We introduce ChiKhaPo, consisting of 8 subtasks of varying difficulty designed to evaluate the lexical comprehension and generation abilities of generative models. ChiKhaPo draws on existing lexicons, monolingual data, and bitext, and provides coverage for 2700+ languages for 2 subtasks, surpassing any existing benchmark in terms of language coverage. We further show that 6 SOTA models struggle on our benchmark, and discuss the factors contributing to performance scores, including language family, language resourcedness, task, and comprehension versus generation directions. With ChiKhaPo, we hope to enable and encourage the massively multilingual benchmarking of LLMs.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Texas (0.04)
- North America > United States > South Carolina (0.04)
- (18 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
Generative Neural Machine Translation
We introduce Generative Neural Machine Translation (GNMT), a latent variable architecture which is designed to model the semantics of the source and target sentences. We modify an encoder-decoder translation model by adding a latent variable as a language agnostic representation which is encouraged to learn the meaning of the sentence. GNMT achieves competitive BLEU scores on pure translation tasks, and is superior when there are missing words in the source sentence. We augment the model to facilitate multilingual translation and semi-supervised learning without adding parameters. This framework significantly reduces over-fitting when there is limited paired data available, and is effective for translating between pairs of languages not seen during training.
- Asia > China > Beijing > Beijing (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Netherlands (0.04)
- Asia > Uzbekistan (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)